Liberec
Supplementary Information
The claim and evidence conflict pairs can be found at https://huggingface. The scope of our dataset is purely for scientific research. Conflict V erification: Ensuring that the default and conflict evidence are contradictory. The human evaluation results showed a high level of accuracy in our data generation process. We select models with 2B and 7B parameters for our analysis. MA2 [ Touvron et al., 2023 ] is a popular open-source foundation model, trained on 2T Models with 7B and 70B parameters are selected for our analysis. To facilitate parallel training, we employ DeepSpeed Zero-Stage 3 [ Ren et al., The prompt for generating semantic conflict descriptions is shown in Figure 1 . The prompt for generating default evidence is shown in Table 6 . The prompt for generating misinformation conflict evidence is shown in Table 7 . The prompt for generating temporal conflict evidence is shown in Table 8 . The prompt for generating semantic conflict evidence is shown in Table 9 .
- Europe > Czechia > Liberec Region > Liberec (0.05)
- Africa > Nigeria > Taraba State (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
- Personal > Honors (0.69)
- Research Report > New Finding (0.68)
- Europe > Czechia > Liberec Region > Liberec (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Nigeria > Taraba State (0.04)
- (12 more...)
- Personal > Honors (1.00)
- Research Report > New Finding (0.93)
Supplementary Information
The claim and evidence conflict pairs can be found at https://huggingface. The scope of our dataset is purely for scientific research. Conflict V erification: Ensuring that the default and conflict evidence are contradictory. The human evaluation results showed a high level of accuracy in our data generation process. We select models with 2B and 7B parameters for our analysis. MA2 [ Touvron et al., 2023 ] is a popular open-source foundation model, trained on 2T Models with 7B and 70B parameters are selected for our analysis. To facilitate parallel training, we employ DeepSpeed Zero-Stage 3 [ Ren et al., The prompt for generating semantic conflict descriptions is shown in Figure 1 . The prompt for generating default evidence is shown in Table 6 . The prompt for generating misinformation conflict evidence is shown in Table 7 . The prompt for generating temporal conflict evidence is shown in Table 8 . The prompt for generating semantic conflict evidence is shown in Table 9 .
- Europe > Czechia > Liberec Region > Liberec (0.05)
- Africa > Nigeria > Taraba State (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
- Personal > Honors (0.69)
- Research Report > New Finding (0.68)
A Benchmark for Evaluating Knowledge Conflicts in Large Language Models
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. While a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge, a comprehensive assessment of knowledge conflict in LLMs is still missing.
- Europe > Czechia > Liberec Region > Liberec (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Nigeria > Taraba State (0.04)
- (12 more...)
- Personal > Honors (1.00)
- Research Report > New Finding (0.93)
Overview of the Sensemaking Task at the ELOQUENT 2025 Lab: LLMs as Teachers, Students and Evaluators
Šindelář, Pavel, Bojar, Ondřej
ELOQUENT is a set of shared tasks that aims to create easily testable high-level criteria for evaluating generative language models. Sensemaking is one such shared task. In Sensemaking, we try to assess how well generative models ``make sense out of a given text'' in three steps inspired by exams in a classroom setting: (1) Teacher systems should prepare a set of questions, (2) Student systems should answer these questions, and (3) Evaluator systems should score these answers, all adhering rather strictly to a given set of input materials. We report on the 2025 edition of Sensemaking, where we had 7 sources of test materials (fact-checking analyses of statements, textbooks, transcribed recordings of a lecture, and educational videos) spanning English, German, Ukrainian, and Czech languages. This year, 4 teams participated, providing us with 2 Teacher submissions, 2 Student submissions, and 2 Evaluator submissions. We added baselines for Teacher and Student using commercial large language model systems. We devised a fully automatic evaluation procedure, which we compare to a minimalistic manual evaluation. We were able to make some interesting observations. For the first task, the creation of questions, better evaluation strategies will still have to be devised because it is difficult to discern the quality of the various candidate question sets. In the second task, question answering, the LLMs examined overall perform acceptably, but restricting their answers to the given input texts remains problematic. In the third task, evaluation of question answers, our adversarial tests reveal that systems using the LLM-as-a-Judge paradigm erroneously rate both garbled question-answer pairs and answers to mixed-up questions as acceptable.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (5 more...)
- Instructional Material (0.87)
- Research Report > New Finding (0.46)
- Government (1.00)
- Education > Educational Setting (0.67)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction
Kohút, Jan, Dočekal, Martin, Hradiš, Michal, Vaško, Marek
Manual digitization of bibliographic metadata is time consuming and labor intensive, especially for historical and real-world archives with highly variable formatting across documents. Despite advances in machine learning, the absence of dedicated datasets for metadata extraction hinders automation. To address this gap, we introduce BiblioPage, a dataset of scanned title pages annotated with structured bibliographic metadata. The dataset consists of approximately 2,000 monograph title pages collected from 14 Czech libraries, spanning a wide range of publication periods, typographic styles, and layout structures. Each title page is annotated with 16 bibliographic attributes, including title, contributors, and publication metadata, along with precise positional information in the form of bounding boxes. To extract structured information from this dataset, we evaluated object detection models such as YOLO and DETR combined with transformer-based OCR, achieving a maximum mAP of 52 and an F1 score of 59. Additionally, we assess the performance of various visual large language models, including LlamA 3.2-Vision and GPT-4o, with the best model reaching an F1 score of 67. BiblioPage serves as a real-world benchmark for bibliographic metadata extraction, contributing to document understanding, document question answering, and document information extraction.
- Europe > Czechia > South Moravian Region > Brno (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (8 more...)
Linguistic Knowledge Transfer Learning for Speech Enhancement
Hung, Kuo-Hsuan, Lu, Xugang, Fu, Szu-Wei, Tseng, Huan-Hsin, Lin, Hsin-Yi, Lin, Chii-Wann, Tsao, Yu
Linguistic knowledge plays a crucial role in spoken language comprehension. It provides essential semantic and syntactic context for speech perception in noisy environments. However, most speech enhancement (SE) methods predominantly rely on acoustic features to learn the mapping relationship between noisy and clean speech, with limited exploration of linguistic integration. While text-informed SE approaches have been investigated, they often require explicit speech-text alignment or externally provided textual data, constraining their practicality in real-world scenarios. Additionally, using text as input poses challenges in aligning linguistic and acoustic representations due to their inherent differences. In this study, we propose the Cross-Modality Knowledge Transfer (CMKT) learning framework, which leverages pre-trained large language models (LLMs) to infuse linguistic knowledge into SE models without requiring text input or LLMs during inference. Furthermore, we introduce a misalignment strategy to improve knowledge transfer. This strategy applies controlled temporal shifts, encouraging the model to learn more robust representations. Experimental evaluations demonstrate that CMKT consistently outperforms baseline models across various SE architectures and LLM embeddings, highlighting its adaptability to different configurations. Additionally, results on Mandarin and English datasets confirm its effectiveness across diverse linguistic conditions, further validating its robustness. Moreover, CMKT remains effective even in scenarios without textual data, underscoring its practicality for real-world applications. By bridging the gap between linguistic and acoustic modalities, CMKT offers a scalable and innovative solution for integrating linguistic knowledge into SE models, leading to substantial improvements in both intelligibility and enhancement performance.
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Asia > Singapore (0.04)
- North America > United States > New Jersey > Essex County > South Orange (0.04)
- (4 more...)
Sanidha: A Studio Quality Multi-Modal Dataset for Carnatic Music
Krishnan, Venkatakrishnan Vaidyanathapuram, Alben, Noel, Nair, Anish, Condit-Schultz, Nathaniel
Music source separation demixes a piece of music into its individual sound sources (vocals, percussion, melodic instruments, etc.), a task with no simple mathematical solution. It requires deep learning methods involving training on large datasets of isolated music stems. The most commonly available datasets are made from commercial Western music, limiting the models' applications to non-Western genres like Carnatic music. Carnatic music is a live tradition, with the available multi-track recordings containing overlapping sounds and bleeds between the sources. This poses a challenge to commercially available source separation models like Spleeter and Hybrid Demucs. In this work, we introduce 'Sanidha', the first open-source novel dataset for Carnatic music, offering studio-quality, multi-track recordings with minimal to no overlap or bleed. Along with the audio files, we provide high-definition videos of the artists' performances. Additionally, we fine-tuned Spleeter, one of the most commonly used source separation models, on our dataset and observed improved SDR performance compared to fine-tuning on a pre-existing Carnatic multi-track dataset. The outputs of the fine-tuned model with 'Sanidha' are evaluated through a listening study.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Michigan (0.05)
- North America > United States > New York (0.04)
- (5 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Conditional Deep Canonical Time Warping
Steinberg, Afek, Eisenberg, Ran, Lindenbaum, Ofir
Temporal alignment of sequences is a fundamental challenge in many applications, such as computer vision and bioinformatics, where local time shifting needs to be accounted for. Misalignment can lead to poor model generalization, especially in high-dimensional sequences. Existing methods often struggle with optimization when dealing with high-dimensional sparse data, falling into poor alignments. Feature selection is frequently used to enhance model performance for sparse data. However, a fixed set of selected features would not generally work for dynamically changing sequences and would need to be modified based on the state of the sequence. Therefore, modifying the selected feature based on contextual input would result in better alignment. Our suggested method, Conditional Deep Canonical Temporal Time Warping (CDCTW), is designed for temporal alignment in sparse temporal data to address these challenges. CDCTW enhances alignment accuracy for high dimensional time-dependent views be performing dynamic time warping on data embedded in maximally correlated subspace which handles sparsity with novel feature selection method. We validate the effectiveness of CDCTW through extensive experiments on various datasets, demonstrating superior performance over previous techniques.
- Asia > Middle East > Israel (0.05)
- Europe > Czechia > Liberec Region > Liberec (0.04)
- Asia > Middle East > Jordan (0.04)
Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering
Zhou, Wei, Mesgar, Mohsen, Friedrich, Annemarie, Adel, Heike
Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning, over data represented in tabular form. Previous approaches demonstrated notable performance by leveraging either closed-source large language models (LLMs) or fine-tuned open-weight LLMs. However, fine-tuning LLMs requires high-quality training data, which is costly to obtain, and utilizing closed-source LLMs poses accessibility challenges and leads to reproducibility issues. In this paper, we propose Multi-Agent Collaboration with Tool use (MACT), a framework that requires neither closed-source models nor fine-tuning. In MACT, a planning agent and a coding agent that also make use of tools collaborate to answer questions. Our experiments on four TQA benchmarks show that MACT outperforms previous SoTA systems on three out of four benchmarks and that it performs comparably to the larger and more expensive closed-source model GPT-4 on two benchmarks, even when using only open-weight models without any fine-tuning. We conduct extensive analyses to prove the effectiveness of MACT's multi-agent collaboration in TQA.
- North America > Canada > Saskatchewan > Saskatoon (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- (26 more...)
- Research Report (1.00)
- Financial News (0.68)
- Transportation > Passenger (1.00)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Transportation > Air (0.93)
- Consumer Products & Services > Travel (0.93)